BaconDecomposition R parity goldens by igerber · Pull Request #457 · igerber/diff-diff

igerber · 2026-05-16T17:51:26Z

Summary

Closes the PR BaconDecomposition methodology audit (Goodman-Bacon 2021) #454 deferred R parity follow-up (TODO.md row removed).
Generated benchmarks/data/r_bacondecomp_golden.json from the committed benchmarks/R/generate_bacon_golden.R script against bacondecomp 0.1.1 on R 4.5.2 (3 DGP fixtures).
tests/test_methodology_bacon.py::TestBaconParityR now active (3/3 pass, no skips): TWFE coefficient parity + weights-sum parity at atol=1e-6 across all 3 fixtures; per-component estimate + weight parity at atol=1e-6 on the 2 non-remap fixtures.
Documented one structural convention divergence on always_treated_remapped: R keeps first_treat=1 as a distinct timing cohort (Later vs Always Treated rows); Python's paper-footnote-11 convention remaps those units to U and folds them into a single treated_vs_never cell per treated cohort. Aggregate is invariant per Theorem 1; per-component breakdown differs. Per-component test skipped on this fixture with explicit documentation; aggregate parity locked.
Tracker promotion: METHODOLOGY_REVIEW.md status row → **Complete** (was **Complete** (R parity pending)). Removed from In Progress prose mention + Priority Order list.

Methodology references (required if estimator / math changes)

Method name(s): BaconDecomposition (cross-language R parity validation; no algorithm changes)
Paper / source link(s): Goodman-Bacon (2021), J. Econometrics 225(2), 254-277. R reference: bacondecomp::bacon() (CRAN).
Any intentional deviations from the source (and why): One R-vs-Python convention divergence documented in new REGISTRY **Note (R parity convention divergence on always-treated)**. The aggregate TWFE coefficient + weights-sum match R at atol=1e-6; only the per-component U-bucket decomposition differs (R splits always-treated as separate type; Python remaps to U per paper footnote 11). Theorem 1's identity is invariant to this re-bucketing.

Validation

Tests added/updated: tests/test_methodology_bacon.py (per-component test now skips the always_treated_remapped fixture with explicit reason; aggregate tests unchanged). 33/33 in methodology bacon file (was 30+3 skipped); 32 in test_bacon.py; 101 across broader bacon/decompose surface (was 98+3 skipped).
Backtest / simulation / notebook evidence (if applicable): N/A (no behavior change; goldens + tracker promotion only).

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-16T17:55:08Z

Overall Assessment
✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator logic, weighting, or variance code; it adds R goldens, activates parity tests, and updates tracker/docs.
The new parity evidence is internally consistent: the committed goldens include the Later vs Always Treated rows on always_treated_remapped, and the PR documents that structural R/Python difference instead of treating it as a defect.
Promoting Bacon from “R parity pending” to **Complete** and removing the Bacon TODO row is supported by the added JSON artifact and active parity coverage.
Only minor P3 items remain: one public docstring now overstates universal R parity, and one test skip message still references the removed TODO deferral.
Verification note: I could not execute tests in this environment because pytest and pandas are not installed; assessment is based on diff/static inspection.

Methodology

P3 Informational — docs/methodology/REGISTRY.md:2661 and tests/test_methodology_bacon.py:397. Impact: the only cross-language mismatch in scope is the always-treated component breakdown, and the PR handles it correctly under the review policy by documenting it in the Registry and skipping only the non-comparable per-component assertion. Concrete fix: none required.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon parity row from TODO.md is justified by the committed goldens and activated parity checks.

Security

No findings.

Documentation/Tests

P3 — diff_diff/bacon.py:1304. Impact: the new example comment says the default path “matches R bacondecomp::bacon() at atol=1e-6” without mentioning the documented always-treated convention exception, so the public docstring is broader than the actual parity contract. Concrete fix: qualify the sentence with the same exception already captured in REGISTRY.md.
P3 — tests/test_methodology_bacon.py:301. Impact: if the golden JSON is absent in a partial checkout or packaging scenario, the skip message still says parity is “deferred” and points to TODO.md, but that deferral row has been removed by this PR. Concrete fix: update the skip text to point only to benchmarks/R/generate_bacon_golden.R / benchmarks/data/r_bacondecomp_golden.json and remove the TODO reference.

R1 verdict was Looks good with 2 P3 informational items. Both addressed: 1. P3 (Documentation/Tests): `bacon_decompose()` docstring example said "matches R bacondecomp::bacon() at atol=1e-6" without mentioning the documented always-treated convention exception. Qualified the example to spell out the aggregate-vs-per-component split: aggregate parity holds for all panels at atol=1e-6, per-component parity holds when first_treat is bounded below by min(time) (no always-treated), and the divergence on always-treated panels is by convention (Python remap-to-U vs R's `Later vs Always Treated`). Cross-references the REGISTRY note for the full contract. 2. P3 (Documentation/Tests): `TestBaconParityR`'s skip message still said the goldens were "deferred until R is provisioned (see TODO.md)" but the TODO row was removed in this PR. Updated to describe the intended skip case (partial-checkout / packaging scenarios where the committed JSON is unavailable) and dropped the TODO reference. Tests unchanged: 33/33 pass in test_methodology_bacon.py.

github-actions · 2026-05-16T18:04:59Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 00ee515275bd3fe2026ba2f3612f8e58da336bba

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator logic, weighting, variance, identification checks, or defaults; it adds committed R goldens, activates parity coverage, and updates Bacon docs/trackers.
Re-review status: the two prior P3s are resolved. The public docstring now scopes R parity correctly at diff_diff/bacon.py:1304-L1313, and the missing-golden skip text no longer points to the removed TODO deferral at tests/test_methodology_bacon.py:301-L309.
Methodology cross-check is consistent with the paper/registry: always-treated units belong in U under paper footnote 11, while R keeps a separate Later vs Always Treated breakdown; the PR documents that convention difference and limits per-component parity to the comparable fixtures at docs/methodology/REGISTRY.md:2661-L2676, docs/methodology/papers/goodman-bacon-2021-review.md:171-L197, and tests/test_methodology_bacon.py:356-L415.
One minor documentation nit remains: the Bacon tracker still says R parity “activates once goldens are committed” even though the goldens are committed and the tests are active at METHODOLOGY_REVIEW.md:932.
Verification note: I could not execute tests/test_methodology_bacon.py because pytest is not installed in this environment.

Methodology

Severity: P3 informational. Impact: the only R/Python mismatch in scope is the always-treated convention difference, and it is explicitly documented in the Registry as a deviation from R while keeping aggregate parity locked across all three fixtures; under the review policy this is not a defect. Concrete fix: none required. Refs: docs/methodology/REGISTRY.md:2673-L2676, tests/test_methodology_bacon.py:321-L415.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon R-parity TODO row is justified by the committed golden artifact and active parity coverage.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: METHODOLOGY_REVIEW.md:932 still reads as if R parity will activate only after future golden-commit work, which is stale after this PR and mildly confusing for tracker readers. Concrete fix: change that line to say the 33 methodology tests are all active, including the committed R-parity tests.
No defect finding on the prior re-review items: diff_diff/bacon.py:1304-L1313 and tests/test_methodology_bacon.py:301-L309 now reflect the intended contract.

R2 verdict was Looks good with 1 P3 informational item. METHODOLOGY_REVIEW.md Test Coverage line read "all active; R parity activates once goldens are committed" - stale after this PR commits the goldens and activates the 3 R-parity tests. Updated to reflect the post-PR state: all 33 tests active including R-parity (with pointer to the committed JSON).

Closes the PR #454 deferred R parity follow-up (TODO.md row removed). Generated `benchmarks/data/r_bacondecomp_golden.json` from the committed `benchmarks/R/generate_bacon_golden.R` script against `bacondecomp 0.1.1` on R 4.5.2. Three DGP fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`. Parity results at atol=1e-6 via `tests/test_methodology_bacon.py::TestBaconParityR`: - TWFE coefficient: ✅ matches across all 3 fixtures - Weights-sum: ✅ matches across all 3 fixtures - Per-component: ✅ on the 2 non-remap fixtures; **structural convention divergence** on `always_treated_remapped` (skipped per-component, kept aggregate). R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort. The aggregate is invariant per Theorem 1 — the U bucket's weight is re-allocated across nested 2x2 cells but the total weight on {cohort_k vs U} is identical. Only the per-component breakdown differs structurally between conventions. Tracker promotions: - METHODOLOGY_REVIEW.md: BaconDecomposition status row → **Complete** (was `**Complete** (R parity pending)`); removed from In Progress prose mention; removed from Priority Order substantive-review list; Test Coverage count refreshed (24 → 33); R Comparison Results block rewritten as **Validated**. - docs/methodology/REGISTRY.md: Reference Implementations bullet + Verified Components checklist + Note (weight modes) updated; new Note (R parity convention divergence on always-treated) documents the convention. - TODO.md: BaconDecomposition R parity goldens row removed. - CHANGELOG.md: new `[Unreleased]` Added bullet for the close-out; PR-B Changed entry tightened ("intended to match" → "matching ... at atol=1e-6"). - diff_diff/bacon.py: `bacon_decompose` docstring example wording tightened from "intended to match" to "matches" with TestBaconParityR pointer. Tests: 33/33 pass in test_methodology_bacon.py (no skips; was 30+3 skipped); 32 pass in test_bacon.py; 101 pass across the broader bacon/decompose surface (was 98+3 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R1 verdict was Looks good with 2 P3 informational items. Both addressed: 1. P3 (Documentation/Tests): `bacon_decompose()` docstring example said "matches R bacondecomp::bacon() at atol=1e-6" without mentioning the documented always-treated convention exception. Qualified the example to spell out the aggregate-vs-per-component split: aggregate parity holds for all panels at atol=1e-6, per-component parity holds when first_treat is bounded below by min(time) (no always-treated), and the divergence on always-treated panels is by convention (Python remap-to-U vs R's `Later vs Always Treated`). Cross-references the REGISTRY note for the full contract. 2. P3 (Documentation/Tests): `TestBaconParityR`'s skip message still said the goldens were "deferred until R is provisioned (see TODO.md)" but the TODO row was removed in this PR. Updated to describe the intended skip case (partial-checkout / packaging scenarios where the committed JSON is unavailable) and dropped the TODO reference. Tests unchanged: 33/33 pass in test_methodology_bacon.py.

R2 verdict was Looks good with 1 P3 informational item. METHODOLOGY_REVIEW.md Test Coverage line read "all active; R parity activates once goldens are committed" - stale after this PR commits the goldens and activates the 3 R-parity tests. Updated to reflect the post-PR state: all 33 tests active including R-parity (with pointer to the committed JSON).

github-actions · 2026-05-16T18:49:04Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 86facdd92c138022c81e1d399340ff70ef7a9d77

Overall assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator logic, weights, variance, assumptions, or defaults; it commits R goldens, activates Bacon parity coverage, and updates methodology/tracker docs.
The prior re-review items are resolved: the public bacon_decompose() docstring now scopes R parity correctly at diff_diff/bacon.py:L1304-L1313, and the missing-golden skip text now reflects committed artifacts at tests/test_methodology_bacon.py:L301-L309.
The only Python/R mismatch in scope is the always-treated component breakdown on always_treated_remapped; that divergence is explicitly documented in docs/methodology/REGISTRY.md:L2686-L2686 and METHODOLOGY_REVIEW.md:L929-L939, so under the stated policy it is informational rather than a defect.
Minor remaining gap: the parity suite skips direct component-level validation for always_treated_remapped, so the documented R-to-Python fold-back claim is not directly asserted in tests at tests/test_methodology_bacon.py:L401-L415.
Verification note: I could not execute the Bacon methodology tests locally because this environment does not have pytest or pandas installed.

Methodology

P3 informational. Impact: the PR’s only cross-language deviation is the always-treated convention difference, and it is now explicitly documented as a Registry note with aggregate parity still locked across all three fixtures, which makes it mitigated under the review policy. Concrete fix: none required. Refs: docs/methodology/REGISTRY.md:L2670-L2686, METHODOLOGY_REVIEW.md:L929-L939, tests/test_methodology_bacon.py:L401-L415.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P3. Impact: TestBaconParityR.test_component_estimates_match_r() fully skips the always_treated_remapped fixture, so the suite does not directly pin the documented structural claim that R’s split Later vs Always Treated + Treated vs Untreated rows collapse to Python’s single treated_vs_never row per cohort. A cohort-level regression in that fold could slip through if overall TWFE parity still holds. Concrete fix: add a fixture-specific assertion for always_treated_remapped that aggregates R’s always-treated and untreated rows by treated cohort, then compares the resulting combined weight and weighted-average estimate to Python’s treated_vs_never component for that cohort. Refs: tests/test_methodology_bacon.py:L356-L415, docs/methodology/REGISTRY.md:L2686-L2686.

R3 verdict was Looks good with 1 P3 informational item. The per-component parity test skips the `always_treated_remapped` fixture (R/Python decompose the U bucket differently by convention), and the REGISTRY note documents that aggregating R's `Later vs Always Treated` + `Treated vs Untreated` rows by treated cohort should match Python's single `treated_vs_never` component for that cohort. The reviewer flagged that the documented structural claim was not directly asserted in tests — a cohort-level regression in the fold-back could slip through under overall TWFE parity. Per memory `feedback_test_coverage_gap_treat_as_actionable`, the "test exists but doesn't directly exercise the documented surface" P3 is actionable. Added `test_always_treated_remapped_fold_back_matches_r` to `TestBaconParityR`: for each treated cohort in the remap fixture, aggregate R's `Later vs Always Treated` + `Treated vs Untreated` rows by combined weight and weight-averaged estimate, then assert both match Python's `treated_vs_never` component for that cohort at atol=1e-6. Currently passes — confirms the documented structural fold-back is exact at numerical precision. Tests: 34/34 pass in test_methodology_bacon.py (was 33; +1 new regression).

github-actions · 2026-05-16T18:55:08Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 780d50287c1f7287786c6719eabb8edeb4da3740

Overall assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator logic, weights, variance, assumptions, or defaults; it commits R parity goldens, updates the methodology/tracker text, and expands Bacon parity coverage in tests/test_methodology_bacon.py:L321-L534, docs/methodology/REGISTRY.md:L2670-L2686, and METHODOLOGY_REVIEW.md:L929-L939.
The prior re-review gap is substantially addressed: TestBaconParityR.test_always_treated_remapped_fold_back_matches_r() now directly asserts the documented R→Python fold-back for the always-treated U bucket at the treated-cohort level in tests/test_methodology_bacon.py:L468-L534.
The Python/R always-treated divergence is now explicitly documented in the Methodology Registry, so under the stated policy it is informational rather than a defect: docs/methodology/REGISTRY.md:L2683-L2686.
Residual gap only at P3: the blanket skip for always_treated_remapped still leaves that fixture’s unaffected timing-vs-timing rows without direct per-component parity assertions.
Verification note: I could not execute the Bacon methodology tests locally because this environment lacks pytest, numpy, and pandas.

Methodology

No findings. The only cross-language mismatch in scope is the always-treated convention difference, and it is explicitly documented as a Registry note rather than an undocumented deviation: docs/methodology/REGISTRY.md:L2683-L2686, METHODOLOGY_REVIEW.md:L935-L939.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P3 Impact: test_component_estimates_match_r() still skips the entire always_treated_remapped fixture, while the new replacement test only checks the aggregated U-bucket collapse. The committed golden for that fixture still contains six unaffected timing-vs-timing rows, so a regression in those components would be caught only indirectly through aggregate parity checks. Concrete fix: on always_treated_remapped, keep direct parity assertions for the six Earlier/Later vs Treated keys and reserve the special fold-back logic only for the U-bucket rows. Refs: tests/test_methodology_bacon.py:L401-L415, tests/test_methodology_bacon.py:L468-L534, benchmarks/data/r_bacondecomp_golden.json:L124-L205.

R4 verdict was Looks good with 1 P3 informational item: the per-component parity test skipped the ENTIRE always_treated_remapped fixture, leaving the 6 timing-vs-timing rows (Earlier/Later vs Earlier/Later Treated between cohorts 3/4/5) without direct per-component parity assertions. Per memory feedback_test_coverage_gap_treat_as_actionable, this is the "test exists but doesn't directly exercise the surface" pattern and should be actionable. Narrowed the carve-out: instead of skipping the whole fixture, drop only the treated_vs_never keys from both Python and R sides (the actual U-bucket convention divergence), and keep direct atol=1e-6 parity assertions on the 6 timing-vs-timing keys. Also refined _classify_r_type to canonicalize R's "Later vs Always Treated" type string to treated_vs_never (Python folds those rows into the U bucket per paper footnote 11, so they belong to the U comparison set semantically even though R numbers them by the always-treated cohort), keeping the narrow carve-out simple. Tests: 34/34 pass in test_methodology_bacon.py (+6 directly asserted timing-vs-timing comparisons in the remap fixture vs prior coverage).

github-actions · 2026-05-16T19:02:21Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8225ba06567f823ee0624cfcec1cd93b08cf79cd

Overall Assessment

Looks good — no unmitigated P0/P1 findings.

Executive Summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math, weights, variance, identification checks, or defaults; it adds committed R goldens, updates the parity harness, and promotes the tracker/docs.
The prior re-review gap is addressed: always_treated_remapped no longer blanket-skips per-component coverage. The test now keeps the six timing-vs-timing rows under direct parity and adds a separate fold-back assertion for the documented U-bucket convention difference in tests/test_methodology_bacon.py and tests/test_methodology_bacon.py.
The Python/R always-treated difference remains explicitly documented in the Methodology Registry, so under the stated policy it is informational rather than a defect in methodology adherence: docs/methodology/REGISTRY.md.
One new P3 issue: several docs still describe the old test shape and stale counts after the new fold-back test landed.
I could not execute the suite here because this environment lacks pytest and pandas.

Methodology

No findings. The only methodology deviation in scope is the always-treated R/Python convention split, and it is documented in docs/methodology/REGISTRY.md and docs/methodology/REGISTRY.md.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P3 Impact: the prose is now out of sync with the actual parity harness. CHANGELOG.md, REGISTRY.md, and METHODOLOGY_REVIEW.md still say TestBaconParityR has 3 tests and that always_treated_remapped is skipped for per-component parity, but the code now adds a fourth fold-back test and only carves out treated_vs_never rows while still directly checking the six timing-vs-timing rows. The in-test docstring is stale in the same way. Concrete fix: update the release/tracker text and the test docstring to say that TestBaconParityR now has 4 tests, the methodology file count increased by one, and only the U-bucket rows are excluded from direct per-component matching on always_treated_remapped. Refs: CHANGELOG.md, docs/methodology/REGISTRY.md, METHODOLOGY_REVIEW.md, METHODOLOGY_REVIEW.md, tests/test_methodology_bacon.py, tests/test_methodology_bacon.py.

R5 verdict was Looks good with 1 P3 informational item: docs prose out of sync with the actual parity harness after R4's carve-out narrowing. CHANGELOG, REGISTRY, METHODOLOGY_REVIEW, and the fold-back test's own docstring still said TestBaconParityR has 3 tests and that always_treated_remapped is skipped for per-component parity, while the code now adds a fourth fold-back test and only carves out treated_vs_never rows while keeping direct parity on the 6 timing- vs-timing rows of that fixture. Refreshed 6 surfaces: - METHODOLOGY_REVIEW.md Verified Components checklist + Test Coverage count (33 -> 34) + R Comparison Results subsection for the remap fixture. - docs/methodology/REGISTRY.md Reference Implementations bullet, Requirements checklist, and Note (R parity convention divergence) text to reflect the narrowed carve-out and the fold-back test. - CHANGELOG.md PR-457 Added entry (4 tests, narrowed carve-out description, fold-back test mention). - tests/test_methodology_bacon.py::test_always_treated_remapped_fold_back_matches_r docstring (no longer says the per-component test "skips" the fixture; says it carves out only the U-bucket rows). Tests: 34/34 pass.

github-actions · 2026-05-16T19:11:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8f504bdf0586d8b721de0da7a9a225aca3e33d6f

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math, weights, variance/SE, identification checks, or defaults; the diff_diff/bacon.py change is docstring-only at diff_diff/bacon.py:1304.
The methodology-facing work is the R-parity validation surface: committed goldens for 3 fixtures, direct per-component parity on the 2 non-remap fixtures plus the 6 timing-vs-timing rows of always_treated_remapped, and a separate fold-back assertion for the documented always-treated convention split in tests/test_methodology_bacon.py:356, tests/test_methodology_bacon.py:481, and docs/methodology/REGISTRY.md:2686.
The prior re-review documentation gap is mostly resolved: REGISTRY.md, METHODOLOGY_REVIEW.md, TODO.md, and the parity test prose now reflect committed goldens and a 4-test active parity harness at docs/methodology/REGISTRY.md:2671, METHODOLOGY_REVIEW.md:929, TODO.md:73, and tests/test_methodology_bacon.py:301.
One residual P3 remains: the older Bacon audit paragraph in CHANGELOG.md still says the JSON goldens are pending and parity tests skip, which now conflicts with the new committed-goldens entry in the same file.
I could not run the parity tests locally because this environment lacks pandas.

Methodology

No findings. The only deviation in scope is the always-treated R/Python decomposition convention, and it is explicitly documented in the registry note, so under the stated review policy it is informational rather than a defect at docs/methodology/REGISTRY.md:2681 and docs/methodology/REGISTRY.md:2686.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon parity deferral from TODO.md is consistent with the newly committed golden fixture file.

Security

No findings.

Documentation/Tests

P3 Impact: CHANGELOG.md still contains a stale Bacon audit paragraph saying the JSON goldens are deferred until local R provisioning and that TestBaconParityR skips when missing. That now contradicts the new top-level changelog entry, the committed golden file, and the active 4-test parity harness, so the release notes remain internally inconsistent. Concrete fix: update or trim the older PR-B audit bullet so it no longer describes the pre-goldens state; the current truth is already captured in CHANGELOG.md:11, but contradicted by CHANGELOG.md:13.

…e-out R6 verdict was Looks good with 1 P3 informational item: the older PR-B audit bullet at CHANGELOG.md:13 (added in PR #454) still described the pre-goldens deferral state ("JSON goldens deferred", "TestBaconParityR skips with a pointer when goldens missing", "status flipped to **Complete (R parity goldens pending)**"). That contradicts the new PR-457 bullet at CHANGELOG.md:11 (committed goldens + 4 active parity tests) within the same [Unreleased] section, so the release notes read as internally inconsistent. Updated 3 strings in the PR-B bullet to reflect the within-release close-out: - Status flip wording: now says the (R parity pending) caveat was closed by the parity-goldens bullet above in this same release. - TestBaconParityR description: 4 tests, all active post-release; skips only in partial-checkout scenarios. - (4) outcome: parity goldens deferral was closed within this release.

github-actions · 2026-05-16T19:18:06Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 1bd8b0efa3017bb93408f3d7ff0ebc62120e35fd

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains.

Executive Summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math, weights, SEs, or defaults; it adds committed R-parity goldens, tightens parity assertions around the always-treated fixture, and promotes the Bacon methodology surface to **Complete**.
The prior AI-review P3 about stale Bacon changelog text appears resolved: the TODO deferral row is removed, the tracker status is updated, and the new parity tests/docs are wired together consistently.
[Newly identified] The authoritative methodology contract still overstates source adherence on the always-treated boundary case. REGISTRY.md now presents first_treat <= min(time) as “per paper footnote 11”, but the repo’s own methodology prose says the paper’s strict shorthand is t_i < 1 and that folding first_treat == min(time) into U is a library boundary convention.
The new R-parity coverage is otherwise well targeted: it now directly checks the 6 timing-vs-timing rows on always_treated_remapped and separately pins the documented R→Python U-bucket fold-back.
I could not execute tests/test_methodology_bacon.py locally because this environment lacks pytest, numpy, and pandas.

Methodology

[Newly identified] Severity: P1. docs/methodology/REGISTRY.md:L2619-L2619 and docs/methodology/REGISTRY.md:L2684-L2686 describe remapping first_treat <= min(time) to U as “per paper footnote 11”, and diff_diff/bacon.py:L473-L477 repeats that framing. But the repo’s own methodology prose distinguishes the paper’s strict t_i < 1 rule from the library’s broader <= min(time) boundary convention at tests/test_methodology_bacon.py:L589-L600, and METHODOLOGY_REVIEW.md:L944-L944 likewise contrasts t_i < 1 with the implemented <= min(time) rule. With this PR removing the Bacon TODO and promoting the surface to **Complete**, the authoritative REGISTRY still does not label that boundary extension as a deviation/library convention.

Impact: users reading the Bacon methodology contract are told this boundary behavior is paper-faithful when the repo’s own methodology notes say otherwise. That is an undocumented methodology deviation under the review rubric, specifically on first-period-treated cohorts such as the new always_treated_remapped parity fixture.

Concrete fix: add an explicit REGISTRY note/deviation stating that the paper puts t_i < 1 in U, while Python intentionally also folds first_treat == min(time) into U as a boundary convention because those units have no untreated observations in-panel; then align the matching wording in diff_diff/bacon.py, CHANGELOG.md, and the Bacon methodology review text.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon R-parity TODO row is consistent with the committed golden file and the new parity-test surface.

Security

No findings.

Documentation/Tests

Severity: P3. CHANGELOG.md:L13-L13 still describes tests/test_methodology_bacon.py as “~24 tests across 6 classes”, while the same PR updates METHODOLOGY_REVIEW.md:L931-L933 to 34 methodology tests and the file now contains 34 test methods.

Impact: minor release-note inconsistency about the size of the active validation surface.

Concrete fix: update the changelog phrase to 34 tests or explicitly scope ~24 as the audit-time count rather than the current post-goldens state.

Execution note: I was unable to run the Bacon methodology suite in this environment because pytest, numpy, and pandas are unavailable.

Path to Approval

Add an explicit REGISTRY note/deviation for the first_treat == min(time) boundary convention, and mirror that wording in the Bacon docstring/changelog text so the promoted BaconDecomposition methodology surface no longer overclaims paper-faithfulness on that case.

R7 surfaced a real P1: the REGISTRY presented the always-treated remap condition `first_treat <= min(time)` as "per paper footnote 11", but the paper's strict rule is `t_i < 1` (units treated *before* the first observable period). The inclusive `<= min(time)` rule additionally folds `first_treat == min(time)` cohorts into U — that's a library boundary convention, not a paper-faithful rule. The test class docstring already called this out, but the authoritative REGISTRY contract did not, which read as an undocumented methodology deviation on PRs that promote BaconDecomposition to **Complete**. Resolution: added a new explicit `**Deviation (first-period boundary extension on always-treated remap)**` block to REGISTRY's Bacon section that: - Names the paper's strict `t_i < 1` rule - States the library's inclusive `<= min(time)` rule - Explains the rationale (`first_treat == min(time)` cohorts have no untreated cell in-panel) - Notes R does NOT apply this fold (it keeps such cohorts in their own bucket and emits `Later vs Always Treated`) - Notes the rules coincide when `min(time) > 1` Mirrored in: - REGISTRY Assumption checks bullet (line 2619): now points at the new Deviation block - REGISTRY `**Note (always-treated remap)**` (line 2684): qualifies the "per paper footnote 11" claim - METHODOLOGY_REVIEW.md Deviations block: re-titled to include paper deviations, added the boundary entry as item 1 - `bacon_decompose()` docstring (`bacon.py:467-487`): explicit boundary-extension paragraph with REGISTRY pointer - CHANGELOG PR-457 Added entry: explicit boundary-deviation callout Also fixes R7 P3: CHANGELOG PR-B test count "~24 tests" updated to acknowledge the post-release 34-test count after R-parity-goldens expansion. Tests: 34/34 pass.

github-actions · 2026-05-16T19:25:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2a2c8813ad53c7189fbb99fb311dff6558f7e135

Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings.

Executive Summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math; it commits R-parity goldens, activates parity tests, and promotes the methodology tracker/docs.
Re-review: the prior P1 on the first-period boundary convention is resolved. The authoritative methodology contract now explicitly labels first_treat == min(time) remap as a library deviation from the paper/R behavior in docs/methodology/REGISTRY.md:L2619-L2621 and docs/methodology/REGISTRY.md:L2684-L2687, mirrored in diff_diff/bacon.py:L473-L486 and METHODOLOGY_REVIEW.md:L946-L947.
The new R-parity surface is internally consistent: direct per-component parity is now claimed only for the unaffected timing-vs-timing rows of always_treated_remapped, while the documented U-bucket convention difference is tested separately via fold-back aggregation in tests/test_methodology_bacon.py:L441-L549.
Removing the Bacon parity deferral row from TODO.md is supported by the committed goldens and the active TestBaconParityR coverage.
One minor P3 remains: the METHODOLOGY_REVIEW.md priority list now skips item 13 after Bacon’s removal.
I could not execute the test suite locally because this environment lacks pytest, numpy, and pandas.

Methodology

No findings. The prior undocumented-deviation issue is resolved by the new explicit deviation language in docs/methodology/REGISTRY.md:L2619-L2621 and docs/methodology/REGISTRY.md:L2684-L2687, plus the matching docstring update in diff_diff/bacon.py:L473-L486.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon R-parity deferral from TODO.md is justified by the committed benchmarks/data/r_bacondecomp_golden.json fixture and the now-active parity tests.

Security

No findings. I did not see any secrets or sensitive data concerns in the added golden file.

Documentation/Tests

Severity: P3. Impact: METHODOLOGY_REVIEW.md’s “Priority Order” list now jumps from 12 to 14, which makes the tracker look stale after Bacon’s removal and slightly reduces scanability. Concrete fix: renumber Survey Data Support from 14. to 13. in METHODOLOGY_REVIEW.md:L1218-L1225.
Execution note: I could not run tests/test_methodology_bacon.py locally because pytest, numpy, and pandas are not installed in this environment.

R8 verdict was Looks good with 1 P3 informational item: the METHODOLOGY_REVIEW.md Priority Order list jumped from item 12 to item 14 after PR #457 removed Bacon (the prior item 1). Renumbered Survey Data Support from 14 to 13 to close the gap.

github-actions · 2026-05-16T19:31:46Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9210aeb9e040fc10eac10b7f914daa760a404b3f

Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings.

Executive Summary

Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math; it commits R-parity goldens, activates parity tests, and updates methodology tracking/docs.
Re-review: the prior tracker-hygiene P3 is resolved. Bacon is now marked **Complete** and the priority list is renumbered correctly in METHODOLOGY_REVIEW.md:L79-L82 and METHODOLOGY_REVIEW.md:L1218-L1225.
The previously concerning always-treated boundary/convention gap is now explicitly documented in the authoritative surfaces: docs/methodology/REGISTRY.md:L2617-L2620, docs/methodology/REGISTRY.md:L2683-L2687, and diff_diff/bacon.py:L473-L486.
The new parity surface is scoped correctly: tests/test_methodology_bacon.py:L356-L549 keeps direct per-component parity on the unaffected timing-only rows of always_treated_remapped and separately asserts the R-to-Python U-bucket fold-back by cohort.
One minor P3 remains: the updated example comment in diff_diff/bacon.py:L1313-L1320 states the per-component parity condition with the wrong inequality relative to the library’s own first_treat <= min(time) remap rule.

Methodology

No findings. The PR’s methodology-facing changes are documentation and validation only, and the R-vs-Python always-treated difference is now explicitly documented rather than left implicit.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon TODO row is supported by the committed golden fixture in benchmarks/data/r_bacondecomp_golden.json and the active parity coverage in tests/test_methodology_bacon.py:L301-L549.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: diff_diff/bacon.py:L1313-L1320 says direct per-component R parity holds when first_treat is “bounded below by min(time).” Under the library’s documented rule, the carve-out is exactly the first_treat == min(time) cohort, so this wording reverses the boundary condition and can mislead readers about when the always-treated convention divergence applies. Concrete fix: rephrase to “when no cohort has first_treat <= min(time) other than never-treated sentinels” or simply “when there are no always-treated / first-period-treated cohorts.”
Execution note: I could not run tests/test_methodology_bacon.py locally because this environment does not have pytest or pandas installed.

R9 verdict was Looks good with 1 P3 informational item: the bacon_decompose docstring example said per-component parity holds "when first_treat is bounded below by min(time)" — that's the inverse of the correct condition. Under the library's `<= min(time)` remap rule, always-treated panels are exactly the ones WITH at least one non-sentinel `first_treat <= min(time)`, so per-component parity holds when all non-sentinel cohorts have `first_treat > min(time)` (i.e. bounded *above* by min(time), or equivalently "no first- period-treated cohorts"). Rephrased. Tests: 34/34 pass; no behavior change (docstring-only fix).

igerber · 2026-05-16T19:54:43Z

/ai-review

github-actions · 2026-05-16T19:58:21Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e592a5b4154791b8356d3756734961bb253a406d

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review: the prior P3 about the bacon_decompose example’s boundary wording is resolved in diff_diff/bacon.py (L1312-L1322).
Affected method: BaconDecomposition / Goodman-Bacon (2021). This PR changes validation, registry/tracker status, and documentation; it does not change estimator math or inference code.
The always-treated / first-period-treated behavior is now explicitly documented as a deviation rather than left implicit, in docs/methodology/REGISTRY.md (L2686-L2687) and METHODOLOGY_REVIEW.md (L946-L949), so it is non-blocking under the review rubric.
The new parity surface is materially stronger: aggregate parity across all 3 fixtures, direct per-component parity on unaffected rows, and a dedicated always-treated fold-back check in tests/test_methodology_bacon.py (L481-L549).
Two minor P3s remain in the test/fixture surface: the committed golden metadata overstates full per-component parity, and the new fold-back selector is less version-robust than the adjacent classifier.
Execution note: I could not run the Bacon parity tests locally because this environment is missing pandas.

Methodology

No findings. BaconDecomposition is the affected method, and the PR’s only methodology-facing deviation, the inclusive first_treat <= min(time) remap, is explicitly labeled as a deviation in docs/methodology/REGISTRY.md (L2686-L2687) and mirrored in diff_diff/bacon.py (L473-L486).

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. Impact: the new fold-back test selects R rows using case-sensitive literal substrings at tests/test_methodology_bacon.py (L520-L526), even though the neighboring classifier at tests/test_methodology_bacon.py (L373-L399) already handles cross-version bacondecomp label variation. That makes future regeneration of the committed goldens more brittle than the rest of the parity harness. Concrete fix: reuse the normalized classifier or lowercased semantic matching when selecting the R-side untreated/always-treated rows for fold-back aggregation.

Tech Debt

No findings. Removing the Bacon TODO row is supported by the committed goldens and active parity coverage.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the committed golden metadata still says the parity target is direct per-component (treated, control, type) tuple parity plus TWFE coefficient at benchmarks/data/r_bacondecomp_golden.json (L6), but this PR intentionally replaces full per-component parity on the always_treated_remapped U-bucket rows with aggregate and fold-back assertions. That can misstate what the fixture is supposed to prove. Concrete fix: update the meta.description to describe the actual contract: aggregate TWFE/weight-sum parity across all fixtures, direct per-component parity on non-remap rows, and cohort-level fold-back parity for the always-treated U bucket.

…meta Fresh R10 verdict was Looks good with 2 P3 informational items: 1. P3 (Maintainability): the always-treated fold-back test selected R rows via case-sensitive literal substrings ("Untreated", "Always Treated", "Later"), while the neighboring _classify_r_type classifier uses case-insensitive semantic matching. Made the selector consistent — case-insensitive matching on "untreated" / "never" / "always" tokens, so the fold-back survives bacondecomp label variation across versions. 2. P3 (Documentation/Tests): committed golden JSON's meta.description still advertised full per-component (treated, control, type) tuple parity as the contract, but PR #457 intentionally replaces that for the always_treated_remapped U-bucket rows with aggregate + fold-back parity. Updated meta.description to describe the actual three-tier contract (aggregate / direct per-component on non-remap + 6 timing-vs-timing rows / cohort fold-back for U bucket) with a pointer to the REGISTRY Notes that document the convention divergence. Tests: 34/34 still pass.

github-actions · 2026-05-16T20:05:04Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a86498ee744c56dcd932484f4a9028f657c362d7

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review: the prior P3s are resolved. The committed golden metadata now describes the narrowed always-treated contract in benchmarks/data/r_bacondecomp_golden.json:L2-L6, and the fold-back selector is now semantic/case-insensitive in tests/test_methodology_bacon.py:L521-L530.
Affected method: BaconDecomposition / Goodman-Bacon (2021). This PR changes parity fixtures, tests, and methodology/tracker documentation; it does not change estimator math, weights, variance, or inference code.
Methodology-wise, the only behavior difference discussed here is the already-implemented first-period boundary extension (first_treat <= min(time)), which is now explicitly documented as a deviation in docs/methodology/REGISTRY.md:L2686-L2687 and mirrored in diff_diff/bacon.py:L473-L486, so it is non-blocking under the review rubric.
The R-parity surface is materially stronger: aggregate parity on all three fixtures, direct timing-row parity on the unaffected always_treated_remapped rows, and a dedicated fold-back assertion for the U-bucket divergence in tests/test_methodology_bacon.py:L441-L458 and tests/test_methodology_bacon.py:L481-L554.
Execution note: this was a static review only; python -m pytest and import pandas both fail in this environment.

Methodology

No findings. The changed registry/review text is consistent with the paper review’s footnote-11 distinction (t_i < 1) and with the existing inclusive remap implementation documented in diff_diff/bacon.py:L473-L486 and docs/methodology/REGISTRY.md:L2617-L2621,L2683-L2688.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3 [Newly identified]. Impact: the committed golden file and registry now describe the narrowed always_treated_remapped contract, but the advertised regeneration script still documents the old full per-component parity contract and still writes the old generic meta.description. Re-running benchmarks/R/generate_bacon_golden.R would reintroduce contradictory provenance for the same fixture set. References: CHANGELOG.md:L11-L13, benchmarks/data/r_bacondecomp_golden.json:L2-L6, versus benchmarks/R/generate_bacon_golden.R:L8-L12, benchmarks/R/generate_bacon_golden.R:L20-L22, benchmarks/R/generate_bacon_golden.R:L196-L200, and benchmarks/R/generate_bacon_golden.R:L221-L225. Concrete fix: update the script header/comments and the meta.description template to match the current contract: aggregate parity across all three fixtures, direct per-component parity only on the non-U rows of always_treated_remapped, and cohort-level fold-back parity for the U bucket.

…late R11 verdict was Looks good with 1 P3 informational item: I had updated the committed JSON's meta.description in R10 to describe the narrowed contract, but the R generator script at benchmarks/R/generate_bacon_golden.R still had the old "atol=1e-6 on per-component (treated, control, type) tuples plus TWFE coefficient" description in BOTH (a) its header docstring (lines 8-22) AND (b) its meta.description value template (lines 218-225). Re-running the script would have overwritten my committed JSON polish with the old contradictory description. Updated both surfaces to the three-tier contract: (1) aggregate TWFE + weights-sum on all 3 fixtures; (2) direct per-component parity on the 2 non-remap fixtures + 6 timing-vs-timing rows of always_treated_remapped; (3) cohort fold-back parity for the U bucket on always_treated_remapped. Pointers to REGISTRY Note (R parity convention divergence on always-treated) + Deviation (first- period boundary extension). Re-ran the R script; JSON written matches the committed text and tests remain green (4/4 in TestBaconParityR, 34/34 across the file). Script is now idempotent on its own committed output.

github-actions · 2026-05-16T20:10:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a202dca3777edf892e0dae7a8202608dba854bf2

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review: the prior parity-provenance P3 is mostly resolved. The committed JSON, activated TestBaconParityR, REGISTRY updates, TODO cleanup, and tracker status now agree on the narrowed always-treated contract.
Affected method: BaconDecomposition / Goodman-Bacon (2021). This PR changes parity fixtures, tests, and methodology documentation only; it does not change estimator math, weights, variance, inference, or defaults.
The first_treat <= min(time) boundary extension is now explicitly documented as a deviation from paper footnote 11 and from R, so it is non-blocking under the review rubric.
One informational issue remains: the fixture-3 inline comment in benchmarks/R/generate_bacon_golden.R still describes the old R behavior and wrong fixture counts.
Execution note: static review only; I could not rerun the Bacon tests in this environment because pandas is unavailable.

Methodology

No findings. Affected method: BaconDecomposition. The new parity carve-out and fold-back test match the documented contract in docs/methodology/REGISTRY.md:L2670-L2688, are mirrored in diff_diff/bacon.py:L473-L495, and are exercised in tests/test_methodology_bacon.py:L356-L554.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the fixture-3 inline comment in benchmarks/R/generate_bacon_golden.R still says R groups first_treat=1 with U and that the fixture has 30 never-treated units, but the current parity contract says R keeps first_treat=1 as a distinct cohort and the code actually builds 25 never-treated units. That leaves the advertised regeneration script internally contradictory, even though the header and committed JSON metadata were updated. References: benchmarks/R/generate_bacon_golden.R:L207-L219, docs/methodology/REGISTRY.md:L2670-L2688, tests/test_methodology_bacon.py:L441-L554, benchmarks/data/r_bacondecomp_golden.json:L2-L6. Concrete fix: rewrite the fixture-3 inline comment to match the committed contract and actual fixture construction: 5 always-treated, 25 never-treated, and R emitting separate Later vs Always Treated rows instead of folding first_treat=1 into U.

R12 verdict was Looks good with 1 P3 informational item: the fixture-3 inline comment in benchmarks/R/generate_bacon_golden.R still described the old contract — said R "natively groups first_treat=1 with U" (wrong; R keeps them as a distinct cohort and emits `Later vs Always Treated`) and said "30 never-treated" (wrong; the script builds 25 never-treated). The header docstring + meta.description template were updated in R11, but this inline block-comment slipped. Rewrote the inline comment to match: (a) the actual fixture construction (5 always-treated, 25 never-treated, 3 timing cohorts at times 3/4/5); (b) the correct R behavior (separate cohort, separate `Later vs Always Treated` rows); (c) pointers to REGISTRY note + deviation block; (d) what the parity tests carve out vs fold-back.

github-actions · 2026-05-16T20:14:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 86c0389bfd5800985cd7beeaaa164297562f9878

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review outcome: the prior P3 documentation mismatch is resolved. benchmarks/R/generate_bacon_golden.R:L207-L222 now matches the committed fixture and the narrowed parity contract exercised in tests/test_methodology_bacon.py:L441-L554.
Affected method: BaconDecomposition / Goodman-Bacon (2021). This PR changes parity goldens, tests, and methodology/docs; it does not change estimator math, weights, variance/SE, identification checks, or defaults.
The only methodology divergence discussed here is the already-documented first-period/always-treated convention difference, explicitly labeled in docs/methodology/REGISTRY.md:L2684-L2688 and mirrored in diff_diff/bacon.py:L473-L486, so it is non-blocking under the review rubric.
TODO.md cleanup is justified: the committed golden file, active parity tests, and tracker status are internally consistent across benchmarks/data/r_bacondecomp_golden.json, tests/test_methodology_bacon.py:L301-L554, and METHODOLOGY_REVIEW.md:L915-L939.
Static review only: I could not rerun the Bacon parity tests in this environment because pytest is not installed.

Methodology

Severity: none. Impact: no undocumented methodology mismatch identified. The R/Python always-treated split and the inclusive first-period remap are both explicitly documented deviations/notes in docs/methodology/REGISTRY.md:L2670-L2688 and reflected in diff_diff/bacon.py:L473-L486. Concrete fix: none.

Code Quality

Severity: none. Impact: the parity harness is stricter than before: the carve-out now excludes only the U-bucket rows, while the six timing-vs-timing rows remain under direct per-component parity in tests/test_methodology_bacon.py:L441-L479. Concrete fix: none.

Performance

Severity: none. Impact: the PR adds fixtures, tests, and documentation only; no production-path performance regression is introduced. Concrete fix: none.

Maintainability

Severity: none. Impact: the generator script, committed goldens, registry, tracker, and bacon.py docstrings now describe the same always-treated contract, reducing future drift. References: benchmarks/R/generate_bacon_golden.R:L8-L22, L207-L222; docs/methodology/REGISTRY.md:L2670-L2688; METHODOLOGY_REVIEW.md:L915-L950; diff_diff/bacon.py:L1312-L1322. Concrete fix: none.

Tech Debt

Severity: none. Impact: removing the Bacon parity TODO is supported by committed artifacts and active test coverage, so this does not appear to be premature cleanup. References: TODO.md:L73-L90, tests/test_methodology_bacon.py:L301-L554. Concrete fix: none.

Security

Severity: none. Impact: changed files contain synthetic benchmark data and documentation only; no secret or PII exposure identified. Concrete fix: none.

Documentation/Tests

Severity: none. Impact: the previous re-review P3 is resolved; fixture-3 comments now match the actual fixture counts and R behavior in benchmarks/R/generate_bacon_golden.R:L207-L222. The new fold-back test at tests/test_methodology_bacon.py:L481-L554 also pins the documented always-treated convention divergence instead of relying on a broad skip. Concrete fix: none.

Residual risk: source inspection only; I could not execute tests/test_methodology_bacon.py here because pytest is unavailable.

…eased] CHANGELOG conflict (PR #457 BaconDecomposition R parity goldens) # Conflicts: # CHANGELOG.md

igerber and others added 3 commits May 16, 2026 14:44

igerber force-pushed the feature/bacon-r-parity-goldens branch from a9e3c64 to 86facdd Compare May 16, 2026 18:45

igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026

igerber merged commit 25d5ed4 into main May 16, 2026
33 of 34 checks passed

igerber deleted the feature/bacon-r-parity-goldens branch May 16, 2026 23:03

igerber added a commit that referenced this pull request May 16, 2026

Merge main into spillover-conley-wave-c-event-study to resolve [Unrel…

d7043f0

…eased] CHANGELOG conflict (PR #457 BaconDecomposition R parity goldens) # Conflicts: # CHANGELOG.md

Conversation

igerber commented May 16, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

igerber commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant